Near-optimal vector quantization — compress embeddings to 3-8 bits with provably unbiased inner products. No training needed. 58 integrations.
Project description
BitPolar
Near-optimal vector quantization with zero training overhead
Compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data. Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.
Key Properties
- Data-oblivious — no training, no codebooks, no calibration data
- Deterministic — fully defined by 4 integers:
(dimension, bits, projections, seed) - Provably unbiased — inner product estimates satisfy
E[estimate] = exactat 3+ bits - Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
- Instant indexing — vectors compress on arrival, 600x faster than Product Quantization
What's New in 0.3.x
-
58 integrations — every major AI framework, vector database, and ML library
-
PyTorch torchao — embedding quantizer, BitPolarLinear, KV cache
-
FAISS drop-in — API-compatible IndexBitPolarIP/L2 replacement
-
LlamaIndex, Haystack, DSPy — VectorStore and Retriever integrations
-
Agentic AI — LangGraph, CrewAI, OpenAI Agents, Google ADK, SmolAgents, PydanticAI
-
Agent memory — Mem0, Zep, Letta backends
-
11 vector databases — Milvus, Weaviate, Pinecone, Redis, ES, DuckDB, SQLite, and more
-
LLM inference — llama.cpp, SGLang, TensorRT, Ollama, MLX KV cache compression
-
ML frameworks — JAX/Flax, TensorFlow/Keras, scikit-learn pipeline
-
30 Python examples covering all integrations
-
Walsh-Hadamard Transform — O(d log d) rotation with O(d) memory (577x less than Haar QR)
-
Python bindings — PyO3 + maturin, zero-copy numpy integration
-
WASM bindings — browser-side vector search via wasm-bindgen
-
no_stdsupport — embedded/edge deployment withallocfeature
Quick Start
Rust
[dependencies]
bitpolar = "0.3"
use bitpolar::TurboQuantizer;
use bitpolar::traits::VectorQuantizer;
// Create quantizer from 4 integers — no training needed
let q = TurboQuantizer::new(128, 4, 32, 42).unwrap();
// Encode a vector
let vector = vec![0.1_f32; 128];
let code = q.encode(&vector).unwrap();
// Estimate inner product without decompression
let query = vec![0.05_f32; 128];
let score = q.inner_product_estimate(&code, &query).unwrap();
// Decode back to approximate vector
let reconstructed = q.decode(&code);
Python
pip install bitpolar
import numpy as np
import bitpolar
# Create quantizer — no training needed
q = bitpolar.TurboQuantizer(dim=768, bits=4, projections=192, seed=42)
# Encode/decode
embedding = np.random.randn(768).astype(np.float32)
code = q.encode(embedding)
decoded = q.decode(code)
# Build a search index
index = bitpolar.VectorIndex(dim=768, bits=4)
for i, vec in enumerate(embeddings):
index.add(i, vec)
ids, scores = index.search(query, top_k=10)
JavaScript (WASM)
import init, { WasmQuantizer, WasmVectorIndex } from 'bitpolar-wasm';
await init();
const q = new WasmQuantizer(128, 4, 32, 42n);
const code = q.encode(new Float32Array(128).fill(0.1));
const decoded = q.decode(code);
const index = new WasmVectorIndex(128, 4, 32, 42n);
index.add(0, vector);
const results = index.search(query, 5);
Walsh-Hadamard Transform
The WHT provides an O(d log d) alternative to Haar QR rotation:
| Property | Haar QR (0.1.x) | Walsh-Hadamard (0.2.x+) |
|---|---|---|
| Time complexity | O(d²) | O(d log d) |
| Memory | O(d²) — 2.3 MB @ d=768 | O(d) — 4 KB @ d=768 |
| Quality | Exact Haar distribution | Near-Haar (JL guarantees) |
| Deterministic | Yes (seed-based) | Yes (seed-based) |
use bitpolar::wht::WhtRotation;
use bitpolar::traits::RotationStrategy;
let wht = WhtRotation::new(768, 42).unwrap();
let rotated = wht.rotate(&embedding);
let recovered = wht.rotate_inverse(&rotated);
API Overview
Core Quantizers
| Type | Description | Use Case |
|---|---|---|
TurboQuantizer |
Two-stage (Polar + QJL) | Primary API — best quality |
PolarQuantizer |
Polar coordinate encoding | Simpler, fallback option |
QjlQuantizer |
1-bit JL sketching | Residual correction |
WhtRotation |
Walsh-Hadamard rotation | Fast, memory-efficient rotation |
Specialized Wrappers
| Type | Description |
|---|---|
KvCacheCompressor |
Transformer KV cache compression |
MultiHeadKvCache |
Multi-head attention KV cache |
TieredQuantization |
Hot (8-bit) / Warm (4-bit) / Cold (3-bit) |
ResilientQuantizer |
Primary + fallback for production robustness |
OversampledSearch |
Two-phase approximate + exact re-ranking |
DistortionTracker |
Online quality monitoring (EMA MSE/bias) |
Language Bindings
| Package | Install | Language |
|---|---|---|
bitpolar |
cargo add bitpolar |
Rust |
bitpolar |
pip install bitpolar |
Python (PyO3) |
@mmgehlot/bitpolar-wasm |
npm install @mmgehlot/bitpolar-wasm |
JavaScript (WASM) |
@mmgehlot/bitpolar |
npm install @mmgehlot/bitpolar |
Node.js (NAPI-RS) |
bitpolar-go |
go get github.com/mmgehlot/bitpolar/... |
Go (CGO) |
bitpolar |
Maven Central | Java (JNI) |
bitpolar-pg |
cargo pgrx install |
PostgreSQL |
58 Integrations — Every Major AI Framework
BitPolar is the single canonical library for vector quantization across the entire AI/ML ecosystem.
RAG & Search Frameworks
| Integration | Package | Description |
|---|---|---|
| LangChain | langchain_bitpolar |
VectorStore with compressed similarity search |
| LlamaIndex | llamaindex_bitpolar |
BasePydanticVectorStore for LlamaIndex |
| Haystack | bitpolar_haystack |
DocumentStore + Retriever component |
| DSPy | bitpolar_dspy |
Retriever module for DSPy pipelines |
| FAISS | bitpolar_faiss |
Drop-in replacement for faiss.IndexFlatIP/L2 |
| ChromaDB | bitpolar_chroma |
EmbeddingFunction + two-phase search store |
Agentic AI Frameworks
| Integration | Package | Description |
|---|---|---|
| LangGraph | bitpolar_langgraph |
Compressed checkpoint saver for stateful agents |
| CrewAI | bitpolar_crewai |
Memory backend for agent teams |
| OpenAI Agents SDK | bitpolar_openai_agents |
Function-calling tools for OpenAI agents |
| Google ADK | bitpolar_google_adk |
Tool for Google Agent Development Kit |
| Anthropic MCP | bitpolar_anthropic |
MCP server (stdio + SSE) for Claude |
| AutoGen | bitpolar_autogen |
Memory store for Microsoft agents |
| SmolAgents | bitpolar_smolagents |
HuggingFace agent tool |
| PydanticAI | bitpolar_pydantic_ai |
Type-safe Pydantic tool definitions |
| Agno (Phidata) | bitpolar_agno |
Knowledge base for high-perf agents |
Agent Memory Frameworks
| Integration | Package | Description |
|---|---|---|
| Mem0 | bitpolar_mem0 |
Vector store backend for Mem0 |
| Zep | bitpolar_zep |
Compressed store with time-decay scoring |
| Letta (MemGPT) | bitpolar_letta |
Archival memory tier |
Vector Databases
| Integration | Package | Description |
|---|---|---|
| Qdrant | bitpolar_embeddings.qdrant |
Two-phase HNSW + BitPolar re-ranking |
| Milvus | bitpolar_milvus |
Client-side compression with reranking |
| Weaviate | bitpolar_weaviate |
Client-side compression with reranking |
| Pinecone | bitpolar_pinecone |
Metadata-stored compressed codes |
| Redis | bitpolar_redis |
Byte string storage with pipeline search |
| Elasticsearch | bitpolar_elasticsearch |
kNN search + BitPolar reranking |
| PostgreSQL | bitpolar-pg |
Native pgrx extension (SQL functions) |
| DuckDB | bitpolar_duckdb |
BLOB storage with SQL queries |
| SQLite | bitpolar_sqlite_vec |
Zero-dependency embedded vector search |
| Supabase | bitpolar_supabase |
Serverless pgvector compression |
| Neon | bitpolar_neon |
Serverless Postgres driver |
LLM Inference Engines (KV Cache)
| Integration | Package | Description |
|---|---|---|
| vLLM | bitpolar_vllm |
KV cache quantizer + DynamicCache |
| HuggingFace Transformers | bitpolar_transformers |
Drop-in DynamicCache replacement |
| llama.cpp | bitpolar_llamacpp |
KV cache compression |
| SGLang | bitpolar_sglang |
RadixAttention cache compression |
| TensorRT-LLM | bitpolar_tensorrt |
KV cache quantizer plugin |
| Ollama | bitpolar_ollama |
Embedding compression client |
| ONNX Runtime | bitpolar_onnx |
Model embedding quantizer |
| Apple MLX | bitpolar_mlx |
Apple Silicon quantizer |
ML Frameworks
| Integration | Package | Description |
|---|---|---|
| PyTorch | bitpolar_torch |
Embedding quantizer, BitPolarLinear, KV cache |
| PyTorch (native) | bitpolar_torch_native |
PT2E quantizer backend |
| JAX/Flax | bitpolar_jax |
JAX array compression + Flax module |
| TensorFlow | bitpolar_tensorflow |
Keras layers for compression |
| scikit-learn | bitpolar_sklearn |
TransformerMixin for sklearn pipelines |
Cloud & Enterprise
| Integration | Package | Description |
|---|---|---|
| Spring AI | BitPolarVectorStore.java |
Java VectorStore for Spring Boot |
| Vercel AI SDK | bitpolar_vercel |
Embedding compression middleware |
| AWS Bedrock | bitpolar_bedrock |
Titan/Cohere embedding compression |
| Triton | bitpolar_triton |
NVIDIA Inference Server backend |
| gRPC | bitpolar-server |
Language-agnostic compression service |
| MCP | bitpolar_mcp |
AI coding assistant tool server |
| CLI | bitpolar-cli |
Command-line compress/search/bench |
How It Works
Input f32 vector
│
▼
┌─────────────────┐
│ Random Rotation │ WHT (O(d log d)) or Haar QR (O(d²))
│ │ Spreads energy uniformly across coordinates
└────────┬────────┘
│
▼
┌─────────────────┐
│ PolarQuant │ Groups d dims into d/2 pairs → polar coords
│ (Stage 1) │ Radii: lossless f32 │ Angles: b-bit quantized
└────────┬────────┘
│
▼
┌─────────────────┐
│ QJL Residual │ Sketches reconstruction error
│ (Stage 2) │ 1 sign bit per projection → unbiased correction
└────────┬────────┘
│
▼
TurboCode { polar: PolarCode, residual: QjlSketch }
Inner product estimation: ⟨v, q⟩ ≈ IP_polar(code, q) + IP_qjl(sketch, q)
Parameter Selection
| Use Case | Bits | Projections | Notes |
|---|---|---|---|
| Semantic search | 4-8 | dim/4 | Best accuracy for retrieval |
| KV cache | 3-6 | dim/8 | Memory vs attention quality |
| Maximum compression | 3 | dim/16 | Still provably unbiased |
| Lightweight similarity | — | dim/4 | QJL standalone (1-bit sketches) |
Feature Flags
| Feature | Default | Description |
|---|---|---|
std |
Yes | Standard library (nalgebra QR, full rotation) |
alloc |
No | Heap allocation without std (Vec via alloc crate) |
serde-support |
Yes | Serde serialization for all types |
simd |
No | Hand-tuned NEON/AVX2 kernels |
parallel |
No | Parallel batch operations via rayon |
tracing-support |
No | OpenTelemetry-compatible instrumentation |
ffi |
No | C FFI exports for cross-language bindings |
no_std Support
BitPolar works on embedded/edge targets with no_std:
[dependencies]
bitpolar = { version = "0.3", default-features = false, features = ["alloc"] }
Uses libm for math functions and alloc for Vec/String. The Walsh-Hadamard rotation is available without std (unlike Haar QR which requires nalgebra).
Traits
BitPolar exposes composable traits for ecosystem integration:
VectorQuantizer— core encode/decode/IP/L2 interfaceBatchQuantizer— parallel batch operations (behindparallelfeature)RotationStrategy— pluggable rotation (QR, Walsh-Hadamard, identity)SerializableCode— compact binary serialization
Examples
30 Python examples + 9 Rust examples + JavaScript, Go, Java examples.
# Rust
cargo run --example search_vector_database
cargo run --example llm_kv_cache
# Python (30 examples covering all 58 integrations)
python examples/python/01_quickstart.py # Core API
python examples/python/12_pytorch_quantizer.py # PyTorch integration
python examples/python/13_llamaindex_vectorstore.py # LlamaIndex
python examples/python/14_faiss_dropin.py # FAISS replacement
python examples/python/18_openai_agents_tool.py # OpenAI Agents
python examples/python/23_vector_databases.py # DuckDB, SQLite, etc.
python examples/python/30_complete_rag.py # End-to-end RAG pipeline
See examples/README.md for the full list.
Performance
Run benchmarks:
cargo bench
References
- TurboQuant (ICLR 2026): arXiv 2504.19874
- PolarQuant (AISTATS 2026): arXiv 2502.02617
- QJL (AAAI 2025): arXiv 2406.03482
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to add a new quantization strategy.
License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bitpolar-0.3.3.tar.gz.
File metadata
- Download URL: bitpolar-0.3.3.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
053b5355b17e2aeacd532960e32c5832d9034dbd6364550c01779aa295422781
|
|
| MD5 |
0aa3a102b296639608fb147d1efb177d
|
|
| BLAKE2b-256 |
cc73be01c5c52a9131bcd348db01d0c53ac93dd02e4dd120ca8b3b4340a3cd74
|
File details
Details for the file bitpolar-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: bitpolar-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 312.4 kB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f9a24a96cec3fa0fca9d66fc9c46b437c2821ef01b9f7c5d620a5e76cea7e74
|
|
| MD5 |
eebbed6ddc1bbef15c314c65712198ac
|
|
| BLAKE2b-256 |
d9f9d75580e2bb7c0e138352476ca3e51346e823b8afe7f3404e6cf2604cd6f0
|